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WHAT IS CLAIMED IS : 

1. A method for at least one of genotyping and haplotyping a sequence of 
polymorphic genetic loci in a deoxyribonucleic acid (DNA) sample or 
identifying a strain variant from the DNA sample, comprising: 

5 i) providing one or more microarrays that include a set of oligonucleotide 

probes that are capable of detecting the at least one of the genotypes 
and the haplotypes or the strain variant; 
ii) hybridizing the DNA sample to the one or more microarrays to create a 
hybridization pattern; and 
10 iii) determining at least one of a genotype and a haplotype or a strain 

variant based on the hybridization pattern. 

2. The method of claim 1, wherein the one or more microarrays include a set of 
oligonucleotide probes that are capable of detecting at least one of all known 
genotypes and all known haplotypes at the polymoiphic genetic loci or the 

1 5 strain identification. 

3 . The method of claim 1 , wherein the one or more microarrays are configured to 
include at least one of an optimal set and an optimal arrangement of 
oligonucleotide probes. 

4. The method of claim 3, further comprising optimizing the least one of the set 
20 and arrangement of oligonucleotides based on the following: 

min j;^Wj e[i^^^ ] 



wherein 3} is a trae allele contained in the DNA sample, fj is the allele 

S and >Vy is a 

weight assigned to at least one of the genotype and the haplotype j. 



{1, if ^is true 
0, otherwise 
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5. The method of claim 4, wherein the weights are provided as follows: 
Wj = l^j, wherein ^ ®^ ^^^^ ^ known genotypes and all 
known haplotypes at one or more predetermined polymorphic genetic loci. 

6. The method of claim 4, wherein the weights are provided as follows: Wy is 
5 different for each genotype or haplotype. 

7. The method of claim 4, wherein step (iii) produces a vector of n 
measurements, wherein n is a number of probes contained on the one or more 
microarrays. 

8. The method of claim 7, wherein flie n potential probes provided to id^tify N 
10 known genotypes or haplotypes are each associated with a response vector 

«/e{0,l}^,7=l,...,/i. 

9. The method of claim 8, further comprising generating a graph G on VCTtices 
corresponding to probe response vectors, 

10. The method of claim 9, wherein the graph G is a complete edge-weighted and 
1 5 vertex-weighted undirected graph G == ( F, £) provided on n vertices, wherein n 

is the number of potential probes. 

1 1 . The method of claim 1 0, wherein the weights w of each vertex v and each edge 
e are constrained by: 0 < w(v), w{e) ^ 1 , 

12. The method of claim 11, wherein the weight w of a vertex v is set to: 
20 w{v) = min{fraction of O's, fraction of 1 's} . 

13. The method of claim 1 1, wherein the weight w of an edge e =={m,v} is set to: 

w(e) = Hamming distance/vector length, 

wherein Hamming distance is measured between the probe response vectors 
corresponding to vertices u and v, and vector length is the length of said probe 
25 response vectors, namely, N. 

14. The metiiod of claim 10, further comprising modifying the graph G by 
thresholding the edges such that the modified graph Gmod is defined as Gmod = 
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(V, EmodX wherein Emod = {e b E: w(e) < p}, and p is a selected threshold 
va[lue. 

15. The method of claim 14, wherein, for the modified graph Gmod and the probe 
set size M, the following is performed: 

5 i) initializing a cwrent-best list of independent sets with 

associated information weights, 

ii) initializing vertex boosting weights to vertex weights iv(v), 

iii) defining a probability distribution on the vertex subset based on 
vertex boosting weights, 

10 iv) choosing a random subset of vertices of a specified size M 

based on the probability distribution, 

v) eliminating one of tiie end-point vertices in each of the edges 
remaining in the induced subgraph on the random subset, 

vi) modifying the vertex boosting wei^ts by increasing the 
15 weights of the vertices that are retained in the subset and 

decreasing the weights of the vertices that were selected in step 
(iv) but eliminated in istep (v), and 

vii) repeating steps (iii) through (vi) for at least one of a 
predetermined numb^ of iterations and until no improvement 

20 to the list of top independent sets is achieved. 

16. The method of claim 15, wherein, for the modified graph Gmod and the probe 
set size M, steps (ii) through (vii) are repeated for a predetermined number of 
iterations, each iteration starting with reinitializing vertex boosting weights to 
vertex weights w(v) in step (ii). 

25 17. The method of claim 16, wherein, for a given fixed small 0<e«l, the probe 
set size M satisfies an inequaUty Pr(Vcode pairs, Hamming distance ^ 1) > 1— 
e. 
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18. The metbod of claim 16, wherein, for a given fixed small 0<€«1 and a fixed 
cx:>l, the probe set size M satisfies an inequality Pr(Vcode pairs, Hamming 
distance > a) > 1-e, 

19. The method of claim 15, wherein the threshold p has a value to enable the 
5 graph G to have a sparsity bounded by ^ ^ sparsity :S 5, wherein flie sparsity is 

definable by the average degree of a vertex in the graph G. 

20. The method of claim 19, wherein the lower bound A is 2i relatively small 
constant, and the upper bound 5 is a fimction of the number of vertices n. 

21. A software arrangement which, when executed on a processing device, 
1 0 configures the processing device to perform the steps comprising: 

i) hybridizing the DNA sample to one or more microarrays to create a 
hybridization pattern, the one or more microarrays including a set of 
oligonucleotide probes that are capable of detecting at least one set of 
genotypes and haplotypes for a sequence of polymorphic genetic loci 

IS in a deoxyribonucleic acid (DNA) sample or identifying a strain 

variant from the DNA sample; and 

ii) determining at least one of a genotype and a haplotype or a strain 
variant based on the hybridization pattern. 

22. The software arrangement of claim 21, wherein the one or more microarrays 
20 include a set of oligonucleotide probes that are capable of detecting at least 

one of all known genotypes and all known haplotypes at the polymorphic 
genetic loci or strain variation. 

23. The software arrangement of claim 21, wherein the one or more microarrays 
are configured to include at least one of an optimal set and an optimal 

25 arrangement of oligonucleotide probes. 

24. The software arrangement of claim 23, wherein the processing device is 
fiuther configured to optintiize the least one of the set and arrangement of 
oligonucleotides based on the following: 
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wherein 2} is a true allele contained in the DNA sample, fj is the allele 

fl, if is true] 

determined by the hybridization step, ELc = ^ ^ , . > , and Wj is a 
^ [O, otherwise] ^ 

weight assigned to at least one of the genotype and the haplotype y. 

5 25. The software arrangement of claim 24, wherein the weights are provided as 
follows: w^ = lVy» wherein \f ^ is a set of at least one of all known 

genotypes and all known haplotypes at one or more predetermined 
polymorphic genetic loci. 

26- The software arrangement of claim 24, wherein the weights are provided as 
10 follows: Wj is different for each genotype or haplotype. 

27. The software arrangement of claim 26, wherein step (i) produces a vector of n 
measurements, wherein w is a number of probes contained on the microairay. 

28. The software arrangement of claim 26, wherein the n potential probes 
provided to identify N known genotypes or haplotypes are each associated 

15 with a response vector dj e {0,l}^, j = l,...,/j. 

29. The software arrangement of claim 28, wherein the processing device is 
furttier configured to generate a graph G on vertices corresponding to probe 
response vectors. 

30. The software arrangement of claim 29, wherein the graph G is a complete 
20 edge-weigjited and vertex-weigjited imdirected graph G — {V^E) provided on n 

vertices, wherein n is the number of potential probes. 

3 1 . The software arrangement of claim 30, wherein the weights w of each vertex v 
and each edge e are constrained by: 0 £ w{y\ w{e)<\. 
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32. The software arrangement of claim 31, wherein the weight w of a vertex v is 
set to: 

w(y) = min{fi:action of O's, jfraction of rs}/100. 

33. The software arrangement of claim 31, wherein the wei^t w of an edge e 
={u,v} issetto: ^ 

w(e) = Hamming distance/vector length, 

wherein Hamming distance is measured between the probe response vectors 
coiresponding to vertices u and v, and vector length is the length of said probe 
response vectors, namely, N. 

34. The software arrangement of claim 30, wherein the processing device is 
ftirther configured to modify the graph G by thresholding the edges such that 
the modified graph Gmod is defined as Gmod == (V^ E^od), wherein E^od = e 
E: w(e) < p) , and /> is a selected threshold value. 

35. The software arrangement of claim 34, wherein, for the modified graph Gmod 
and the probe set size M, the following is performed: 

i) initializing a current-best list of independent sets with 
associated information weights, 

ii) initializing vertex boosting weights to vertex weights m<v), 

iii) defining a probability distribution on the vertex subset based on 
vertex boosting weights, 

iv) choosing a random subset of vertices of a specified size M 
based on the probability distribution, 

v) eliminating one of the end-point vertices in each of the edges 
remaining in the induced subgraph on the random subset, 

vi) modifying the vertex boosting weights by increasing the 
weights of the vertices that are retained in the subset and 
decreasing the weights of the vertices that were selected in step 
(iv) but eliminated in step (v), and 
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vii) repeating steps (iii) through (vi) for at least one of a 
predetermined number of iterations and until no improvement 
to the list of top independent sets is achieved. 

36. The software arrangement of claim 35, wherein, for the modified graph Gmod 
5 and the probe set size Af, steps (ii) through (vii) are repeated for a 

predetermined number of iterations, each iteration starting with reinitializing 
vertex boosting weights to vertex weights >v(v) in step (ii). 

37. The software arrangement of claim 36, wherein, for a given fixed small 
0<€«1, the probe set size M satisfies an inequality Pr(Vcode pairs, 

10 Hamming distance ^ 1) > 1-e . 

38. The software arrangement of claim 36, wherein, for a given fixed small 
0<e«l and a fixed ot>l, tiie probe set size M satisfies an inequality 
iV(Vcode pairs, Hanmiing distance > a) > 1— e . 

39. The software arrangement of claim 36, wherein the threshold p has a value to 
1 5 enable the graph G to have a sparsity boimded by ^ ^ sparsity 5, wherein the 

sparsity is definable by the average degree of a vertex in the graph G. 

40. The software arrangement of claim 39, wherein the lower bound ^4 is a 
relatively small constant, and the upper bound 5 is a fimction of the number of 
vertices n. 

20 41. A storage medium which includes thereon a software arrangement for 
providing one or more microarrays, which is capable of configuring a 
processing arrangement to perform the steps comprising: 

i) receiving information regarding a hybridization of the DNA sample to 
one or more microarrays to create a hybridization pattern, the one or 
25 more microarrays including a set of oligonucleotide probes that are 

capable of detecting at least one set of genotypes and haplotypes for a 
sequence of polymorphic genetic loci in a deoxyribonucleic acid 
(DNA) sample or identifying a strain variant fi-om the DNA sample; 
and 
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ii) determimng at least one of a genotype and a haplotype or a strain 
variant based on the hybridization pattern. 

42, The storage medium of claim 41 , wherein the one or more microarrays include 
a set of oligonucleotide probes that are capable of detecting at least one of all 

S known g^otypes and all known haplotypes at the polymorphic genetic loci or 

strain variation. 

43. The storage mediimi of claim 42, wherein the one or more microarrays are 
configured to include at least one of an optimal set and an optimal 
arrangement of oligonucleotide probes. 

10 44. The storage medium of claim 43, wherein the processing device is further 
configured to optinodze the least one of the set and arrangement of 
oligonucleotides based on the following: 

type] ' 

<i> minJ^Wy Pr(r^56j}). 



wherein 2} is a true allele contained in the DNA sample, f} is the allele 

if is true! 
otherwise}' 

weight assigned to at least one of the genotype and the haplotype j. 



15 determined by the hybridization step, rL = i^' *x -n. xo u. i*^. i ^ 



45. The storage medium of claim 44, wherein the weights are provided as follows: 
Wj-X \f ^ , wherein is a set of at least one of all known genotypes and all 

known haplotypes at one or more predetermined polymorphic genetic loci. 

20 46. The storage medium of claim 44, wherein the weights are provided as follows: 
is dififerCTLt for each genotype or haplotype. 

47. The storage medium of claim 44, wherein step (i) produces a vector of n 
measurements, wherein n is a number of probes contained on the microairay. 
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48. The storage medium of claim 46, wherein the n potential probes provided to 
identify known genotypes or haplotypes are each associated with a response 

vector ^^elOjll^jy =1,...,/!. 

49. The storage medium of claim 48, wherein the processing device is further 
5 configured to generate a graph G on vertices corresponding to probe response 

vectors. 

50. The storage medium of claim 49, wherein the graph G is a complete edge- 
weighted and vertex-weighted undirected graph G - (V^ E) provided on n 
vertices, wherein n is the number of potential probes. 

10 51. The storage medium of claim SO, wherein the weights w of each vertex v and 
each edge e are constrained by: 0 ^ w(y), w(e) ^ 1 , 

52. The storage medium of claim 51, wherein the weight wof a vertex v is set to: 

>i;(v) = min{fraction of O's, firaction of 1 's}. 

53. The storage medium of claim 52, wherein the weight iv of an edge e ={tt,v} is 
15 set to: 

w(e) = Hamming distance/vector length, 

wherein Hamming distance is measured between the probe response vectors 
corresponding to vertices u and v, and vector length is the length of said probe 
response vectors, namely, N. 

20 54. The storage medium of claim 53, wherein the processing device is further 
configured to modify the graph G by thresholding the edges such that the 
modified graph Gmod is defined as Gmod = {V, Emod\ wherein E^od - {e e E: 
w(e) < p}, and pis a selected threshold value. 

55. The storage medium of claim 54, wherein, for the modified graph Gmod and the 
25 probe set size My the foUowuig is performed: 

i) initializing a current-best Ust of independent sets with 
associated information weights. 
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ii) initializmg vertex boosting weights to vertex weights >i<v), 

iii) defining a probability distribution on the vertex subset based on 
vertex boosting weights, 

iv) choosing a random subset of vertices of a specified size M 
S based on the probability distribution, 

v) eliminating one of the end-point vertices m each of the edges 
remaining in the induced subgraph on the raudom subset, 

vi) modifying the vertex boosting weights by increasing the 
weights of the vertices that are retained in the subset and 

1 0 decreasing the weights of the vertices that were selected in step 

(iv) but eliminated in step (v), and 

. vii) repeating steps (iii) through (vi) for at least one of a 
predetermined number of iterations and until no improvement 
to the list of top independent sets is achieved. 

15 56. The storage medium of claim 54, wherein, for the modified graph Gmod and the 
probe set size M, steps (ii) through (vii) are repeated for a predetermined 
number of iterations, each iteration starting with reinitializuig vertex boosting 
weights to vertex weights w{y) in step (ii). 

57. The storage medium of claim 54, wherein, for a given fixed small 0<€«1, 
20 the probe set size M satisfies an inequality Pr(Vcode pairs, Hamming distance 

>l)>l-e. 

58. The storage medium of claim 54, wherein, for a given fixed small 0<€«1 
and a fixed oOl, the probe set size M satisfies an inequality Pr(Vcode pairs, 
Hamming distance 3: a) > l-€ . 

25 59. The storage medium of claim 54, wherein tiie threshold p has a value to enable 
the graph G to have a sparsity bounded by ^ ^ sparsity ^ 5, wherein the 
sparsity is definable by the average degree of a vertex in Ihe graph G. 
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10 
15 

62. 

20 

63. 
64. 

25 



The storage medium of claim 59, wherein flie lower bound -4 is a relatively 
small constant, and the upper bound 5 is a function of the number of vertices 
/I. 

A system for at least one of genotyping and haplotyping polymorphic genetic 
loci or strain identification in a deoxyribonucleic acid (DNA) sample, 
comprising: 

a processing arrangement which is capable of being programmed to: 

i) receive information regarding a hybridization of the DNA sample to 
one or more microarrays to create a hybridization pattern, the one or 
more microarrays including a set of oligonucleotide probes that are 
enable of detecting at least one set of genotypes and haplotypes for a 
sequence of polymorphic genetic loci in a deoxyribonucleic acid 
(DNA) sample or identifying a strain variant firom the DNA sample; 
and 

ii) determine at least one of a genotype and a haplotype or a strain variant 
based on the hybridization pattern. 

The system of claun 61, wherein the one or more microarrays include a set of 
oligonucleotide probes that are capable of detecting at least one of all known 
genotypes and all known haplotypes at the polymorphic genetic loci or strain 
variation. 

The system of claim 61, wherein the one or more microarrays are configured 
to include at least one of an optimal set and an optimal arrangement of 
oligonucleotide probes. 

The system of claim 63, wherein flie processing arrangement is further 
programmed to optimize the least one of the set and arrangement of 
oligonucleotides based on the following: 
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wherein 3} is a trae allele contained in the DNA sample, Tj is the aUele 

X if is true 
0, otherwise 

weight assigned to at least one of the genotype and the haplotypey. 



determined by the hybridization stq>, ELc = 



s and Wj is a 



65. The sj^tem of claim 64, wherein the weights are provided as follows: 
5 Wy = 1 \/ J , wherein \/ j is a set of at least one of all known genotypes and all 

known h^lotypes at one or more predetermined polymoiphic genetic loci. 

66. The system of claim 64, wherein the weights are provided as follows: Wj is 
different for each genotype or haplotype. 

67. The system of claim 64, wherein step (i) produces a vector of n measurements, 
10 wherein /i is a number of probes contained on the one or more microarrays. 

68. The system of claim 66, wherein the n potential probes provided to identify N 
known genotypes or haplotypes are each associated with a response vector 

t;ye{0,l}^,7=l,...,n. 

69. The system of claim 68, wherein the processing arrangement is furth^ 
15 programmed to generate a graph G on vertices corresponding to probe 

. response vectors. 

70. The system of claim 69, wherein the graph G is a complete edge-weighted and 
vertex-weighted imdirected graph G = (F, iS) provided on n vertices, wherein n 
is the number of potential probes. 

20 71 . The system of claim 70, wherein the weights w of each vertex v and each edge 
e are constrained by: 0 < w{y\ ^ 1 . 

72. The system of claim 71, wherein the weight w of a vertex v is set to: 

w(y) = min{fraction of O's, fraction of Ts} . 

73. The system of claim 71, wherein the weight w of an edge e ={w,v} is set to: 
25 w(e) = Hanmiing distance/vector length. 
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wherein Hamming distance is measured between the probe response vectors 
corresponding to vertices u and v, and vector length is the length of said probe 
response vectors, namely, N. 

74. The system of claim 70, wherein the processing arrangement is further 
5 programmed to modify the graph G by thresholding the edges such that the 

modified graph Gnwd is defined as Gmod - iV» EmodX wherein Emod - {e e E: 
w(e) < p} , and p is a selected tiireshold value. 

75. The system of claim 74, wherein, for the modified graph G,nod and the probe 

set size M, the following is performed: 

10 i) initializing a current-best list of independent sets with 

associated information weights, 

ii) initializing vertex boosting weights to vertex weights w(v), 

iii) defining a probability distribution on the vertex subset based on 
vertex boosting weights, 

IS iv) choosing a random subset of vertices of a specified size M 

based on the probabihty distribution, 

v) eliminating one of the end-point vertices in each of the edges 
remaining in the induced subgraph on the random subset, 

vi) modifying the vertex boosting weights by increasing the 
20 weights of the vertices that are retained in the subset and 

decreasing the weights of the vertices that were selected in step 
(iv) but eliminated in step (v), and 

vii) repeating steps (iii) through (vi) for at least one of a 
predetermined number of iterations and until no improvement 

25 to the list of top independent sets is achieved. 

76. The system of claim 75, wherein, for the modified graph Gmod and the probe 
set size Af, steps (ii) through (vii) are repeated for a predetermined number of 
iterations, each iteration starting with reinitializing vertex boosting weights to 
vertex weights w(v) in step (ii). 
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77. The system of claim 76, wherein, for a given fixed small 0<€«1, the probe 
set size M satisfies an inequality Pr(Vcode pairs, Hamming distance ^ 1) > 1- 

78. The system of claim 76, wherein, for a given fixed small 0<€«1 and a fixed 
5 a>l, the probe set size M satisfies an inequality Pr(Vcode pairs, Hamming 

distance > a) > 1-e. 

79. The system of claim 76, wherein the threshold p has a value to enable the 
graph G to have a sparsity bounded hyA^ sparsity ^ 5, wherein the sparsity is 
definable by the average degree of a vertex in the graph G. 

10 80. The system of claim 79, wherein the lower bound A is a relatively smaU 
constant, and the upper bound ^ is a function of tiie number of vertices n. 
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